Chronic shedding of a SARS-CoV-2 Alpha variant in wastewater

Background Central Michigan University (CMU) participated in a state-wide SARS-CoV-2 wastewater monitoring program since 2021. Wastewater samples were collected from on-campus sites and nine off-campus wastewater treatment plants servicing small metropolitan and rural communities. SARS-CoV-2 genome copies were quantified using droplet digital PCR and results were reported to the health department. Results One rural, off-campus site consistently produced higher concentrations of SARS-CoV-2 genome copies. Samples from this site were sequenced and contained predominately a derivative of Alpha variant lineage B.1.1.7, detected from fall 2021 through summer 2023. Mutational analysis of reconstructed genes revealed divergence from the Alpha variant lineage sequence over time, including numerous mutations in the Spike RBD and NTD. Conclusions We discuss the possibility that a chronic SARS-CoV-2 infection accumulated adaptive mutations that promoted long-term infection. This study reveals that small wastewater treatment plants can enhance resolution of rare events and facilitate reconstruction of viral genomes due to the relative lack of contaminating sequences. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-09977-7.


Introduction
Wastewater surveillance became an important public health tool during the COVID-19 pandemic.Wastewater surveillance programs identified outbreaks within communities and individual buildings and it is increasingly being used to detect variants of concern [1][2][3][4][5][6][7][8][9][10][11].The goal of wastewater surveillance is to provide data to public health agencies so that they may make informed decisions regarding mitigation strategies such as physical distancing, masking, business closures, and distribution of resources such as prophylactic vaccines.
The State of Michigan Department of Health and Human Services (MDHHS) initiated a wastewater surveillance program in 2021.The program included partnerships between academic laboratories and regional public health departments that spanned large and small metropolitan areas and rural areas in both the lower and upper peninsulas.Central Michigan University (CMU) formed a partnership with the Central Michigan District Health Department (CMDHD).This partnership provided an opportunity to look at the dynamics of SARS-CoV-2 at a public university and in the surrounding small metropolitan and rural communities [12].We identified ten on-campus sewer sites and nine off-campus wastewater treatment plants (WWTPs) to sample on a weekly basis.The population sizes serviced by these WWTPs ranged from as large as 35,397 to as small as 851.
Sampling began in July 2021, which was at least seven months after emergence of the Alpha variant (B.1.1.7).The Alpha variant first appeared in North America in late November 2020 and became the predominant SARS-CoV-2 variant by the end of March 2021.The Alpha variant diverged into multiple lineages, including B.1.1.7derivativeslike Q.3.The Q.3 lineage is present in 5,422 clinical sequences worldwide that were uploaded to the NCBI database, and a positive sample was first collected on 7-11-20 (The dates used to describe samples are formatted using the American system (MM-DD-YY)).The Q.3 lineage was detected in clinical samples from Michigan 16 times between 2-18-21 and 7-9-21.
It became clear that our smallest WWTP (estimated population served: 851) consistently produced higher concentrations of SARS-CoV-2 genome copies.Samples taken from this site from 2021 to 2023 were sequenced and many contained sequences that corresponded to an Alpha variant lineage.These sequences also accumulated novel mutations over time, including previously described mutations found only in cryptic lineages derived from wastewater and not in circulating clinical samples [13,14].In this manuscript, a cryptic mutation is a mutation previously identified in cryptic lineages, which were previously identified in wastewater but not clinical samples.It's important to highlight that many of the mutations previously identified in cryptic lineages have since been identified in clinical samples.We hypothesize that an individual was chronically infected with an Alpha variant lineage for 20-28 months.During this time, the virus adapted by accumulating novel mutations, which included previously described cryptic mutations [13,14].Importantly, we found that the earliest sample corresponded to Alpha variant lineage Q.3, which closely aligned with clinical sequences reported in summer and fall 2021; however, the sequence diverged over time and accumulated novel mutations.
These data reveal that wastewater surveillance in small metropolitan and rural communities provide an opportunity to identify novel isolates and reconstruct genes due to lower contamination with unrelated sequences.These data also suggest that humans and other animals can chronically shed SARS-CoV-2 over many months, which is associated with accumulation of adaptive mutations.Mutations associated with chronic infection may be useful to identify individuals who are chronically infected and to drive selection of appropriate therapeutics.

Selection of sample sites
Central Michigan University (CMU) is a public research university in the City of Mt.Pleasant, Isabella County, Michigan, with an average population during the 2021-2022 academic year of 13,684 students and staff.Ten sample sites were selected on campus that collected wastewater downstream from most campus buildings, including residential halls, apartments, and academic/administrative buildings.The waste stream at these sites includes a mixture of wastewater from CMU and upstream residential areas in the City of Mt.Pleasant.Nine off-campus sites throughout the jurisdictions of the Central Michigan District Health Department (CMDHD) and Mid-Michigan District Health Department (MMDHD) were selected [12], which included the City of Mt.Pleasant, Union Township, City of Alma, City of Clare, City of Evart, three Houghton Lake townships, and Village of Marion wastewater treatment plants (WWTPs).These locations represent various land uses and population densities including urban, rural, and suburban areas, providing a large footprint of SARS CoV-2 virus shedding in Central Michigan.

Wastewater collection
Since July 2021, wastewater samples (500-1000 mL) were collected once each week on either Monday or Tuesday from ten sanitary sewer sites and nine WWTP influent streams (after grit removal).Sanitary sewer grab samples consisted of wastewater flowing from university dormitories and buildings and the surrounding community.Influent to WWTPs were collected as grab samples or 24-hour composite samples [12].Samples were held at 4 °C no more than 48 h before analysis.

Virus concentration and RNA extraction
The protocol described by Flood et al. 2021 and adopted by the Michigan wastewater surveillance network was used to concentrate virus from samples and extract viral RNA [12,15].Briefly, 100 mL wastewater or water as a negative control was mixed with 8% (w/v) molecular biology grade PEG 8000 (Promega Corporation, Madison WI) and 0.2 M NaCl (w/v).The sample was mixed slowly on a magnetic stirrer at 4 °C for 2-16 h.Following overnight incubation, samples were centrifuged at 4,700×g for 45 min at 4 °C.The supernatant was then removed, and the pellet was resuspended in the remaining liquid, which ranged from 1 to 3 mL.All sample concentrates were aliquoted and stored at -80 °C until further processing.Viral RNA was extracted from concentrated wastewater using the Qiagen QIAmp Viral RNA Minikit according to the manufacturer's protocol with previously published modifications (Qiagen, Germany) [15].In this study, a total of 200 µl of concentrate was used for RNA extraction resulting in a final elution volume of 80 µl.Extracted RNA was stored at -80 °C until analysis.A wastewater negative extraction control was included.To derive recovery efficiencies for each sample site, samples were inoculated with 10 6 gene copies (GC)/mL Phi6 bacteriophage (Phi6) prior to the addition of PEG and NaCl.Wastewater samples were mixed, and a 1 mL sample was reserved and stored at -80 °C.RNA was extracted as stated above.

Detection and quantification of SARS-CoV-2
A one-step RT-ddPCR approach was used to determine the copy number/20 µL of SARS-CoV-2, and data were converted to copy number/100 mL wastewater for N1 and N2 targets using the method published by Flood et al., 2001 [15].All the primers and probes used in this study were published previously [12].Droplet digital PCR was performed using Bio-Rad's 1-Step RT-ddPCR Advanced kit with a QX200 ddPCR system (Bio-Rad, CA, USA).Each reaction contained a final concentration of 1 × Supermix (Bio-Rad, CA, USA), 20 U µL −1 reverse transcriptase (RT) (Bio-Rad, CA, USA), 15 mM DTT, 900 nmol l −1 of each primer, 250 nmol l −1 of each probe, 1 µL of molecular grade RNAse-free water, and 5.5 µL of template RNA for a final reaction volume of 22 µL [12,[15][16][17].RT was omitted for DNA targets.Droplet generation was performed by microfluidic mixing of 20 µL of each reaction mixture with 70 µL of droplet generation oil in a droplet generator (Bio-Rad, CA, USA) resulting in a final volume of 40 µL of reaction mixture-oil emulsions containing up to 20,000 droplets with a minimum droplet count of > 9000.The resulting droplets were then transferred to a 96-well PCR plate that was heat-sealed with foil and placed into a C1000 96-deep-well thermocycler (Bio-Rad, CA, USA) for PCR amplification using the following parameters: 25 °C for 3 min, 50 °C for 1 h, 95 °C for 10 min, followed by 40 cycles of 95 °C for 30 s and 60 °C for 1 min with ramp rate of 2 °C/s 1 followed by a final cycle of 98 °C for 10 min.Following PCR thermocycling, each 96-well plate was transferred to a QX200 Droplet Reader (Bio-Rad, CA, USA) for the concentration determination through the detection of positive droplets containing each gene target by spectrophotometric detection of the fluorescent probe signal.All analyses were run in triplicate for each marker.To derive recovery efficiencies for each sample site, Phi6-spiked pre-and post-PEG concentration RNA samples were used to quantify Phi6 copy number using the previously published primers and probes [12].The degree of PCR inhibition was also quantified in each sample by spiking 10 µL of 10 5 GC/ml Phi6 in a sample's Buffer AVL, including positive controls that lacked wastewater.

Data analysis
All SARS-CoV-2 gene data were converted from GC per 20 µL reaction to GC per 100 mL wastewater sample before analysis [12,15].Non-detects (ND) were assigned their individual sample's limit of detection for the purposes of data reporting, although any weekly on-campus or off-campus samples whose values matched the theoretical limit of detection were removed prior to statistical analysis.The limit of detection was calculated for each individual sample based on both the molecular assays' theoretical detection limits (i.e., 3 positive droplets for RT-ddPCR; the lowest standard curve concentration for RT-qPCR) and the concentration factor of each processing method examined.All wastewater data were reported to MDHHS and uploaded to the Michigan COVID-19 Sentinel Wastewater Epidemiological Evaluation Project (SWEEP) dashboard (https:// www.michi gan.gov/ coron avirus/ stats/ waste water-surve illan ce/ dashb oard/ senti nel-waste water-epide miolo gy-evalu ation-proje ct-sweep).

Sequencing
RNA was shipped to GT Molecular (Fort Collins, CO) on dry ice.Library preparation was done using GT Molecular's proprietary method, which utilized ARTIC 4.1 primers for SARS-CoV-2 amplicon generation (https:// artic.netwo rk/ ncov-2019).Amplicons were pooled and sequenced on a Miseq using 2 × 150 bp reads.FASTQ files were analyzed using GT Molecular's bioinformatics pipeline, and variant-calling was performed using a modified and proprietary version of Freyja [18].FASTQ files for each sample listed in Table 1 are available in the NCBI SRA database (Submission ID: SUB13897431; BioProject ID: PRJNA1027333).

Spike reconstruction and identification of novel mutations
FASTQ files from 11-9-21, 9-12-22, 4-24-23, and 5-1-23 contained reads that spanned the entire Spike protein, they lacked contamination with other variants of concern based on variant calling, and they had high relative abundance of the Alpha variant lineage B.1.1.7 derivative.This allowed for reconstruction of a consensus Spike gene for each of the above wastewater samples.Specifically, we uploaded FASTA-formatted .txtfiles into Galaxy (https:// usega laxy.org/) that represented the SARS-CoV-2 reference Spike gene.We then uploaded each of the paired-end FASTQ files for each wastewater sample.The Bowtie2 program was used to map reads against each reference sequence, creating individual .bamfiles per sample.The default setting was used for analysis.The Convert Bam program was then used to convert .bamfiles to FASTA multiple sequence alignments.Multiple sequence alignment files were uploaded to MEGA (https:// www.megas oftwa re.net/) and converted to amino acid sequence The consensus amino acid sequence from each of these samples was manually reconstructed and then aligned with the SARS-CoV-2 Spike reference sequence and a consensus Alpha variant lineage Q.3 sequence derived from 16 clinical samples collected in Michigan from 2-18-21 to 7-9-21.The Q.3 lineage was chosen because the earliest wastewater sample that tested positive for SARS-CoV-2 Alpha variant (i.e., 10-26-21) was Alpha variant lineage Q.3 based on the GT Molecular variant calling pipeline.Mutations that were present in wastewater samples but not in the SARS-CoV-2 Spike reference sequence or clinical sample were characterized as novel mutations.FastQC was used to quantify the total number of reads in each FASTQ file, the total number of reads that aligned to the reference Spike, the read length, and the number of poor-quality sequences (Supplementary Table 1).

Novel and cryptic mutation hotspot analyses
We identified novel mutations as described above.Previous literature also identified cryptic sequence hotspots in SARS-CoV-2 Spike [13,14].We tracked the percent prevalence of novel and cryptic mutations in wastewater samples that were positive for the Alpha variant lineage.Specifically, we uploaded FASTA-formatted .txtfiles into Galaxy (https:// usega laxy.org/) that represented the SARS-CoV-2 reference Spike.We then uploaded each of the paired-end FASTQ files for each wastewater sample.The Bowtie2 program was used to map reads against the reference sequence.The default setting was used for analysis.The Convert Bam program was then used to convert .bamfiles to FASTA multiple sequence alignments.Multiple sequence alignment files were uploaded to MEGA (https:// www.megas oftwa re.net/) and converted to amino acid sequence for open-reading frame analysis.Novel and cryptic mutations were identified manually, and the column of reads were copied and pasted into Excel.The column was selected, and the Analyze Data tool was selected to calculate the percent prevalence of the novel and cryptic mutations.This was repeated for each novel and cryptic mutations across all samples positive for Alpha variant lineage and the percent prevalence data was represented as heatmaps.Novel mutations present in the 2021, 2022, and 2023 consensus Spike proteins were mapped onto the furin cleaved spike protein of SARS-CoV-2 with one RBD erect using UCSF Chimera [19].This atomic structure was selected because it had the greatest resolution of each amino acids across the Spike protein and allowed mapping of most novel mutations.

Chronic shedding of an Alpha variant lineage at a rural WWTP
Wastewater samples were collected between July 2021 and June 2023 from ten on-campus sanitary sewer sites and nine WWTP influent streams.SARS-CoV-2 genome copies per 100 mL wastewater were determined each week and reported to MDHHS.One site was notable for higher peaks of virus shedding, which culminated in a peak that was 4 logs higher than the mean for all sites, although high peaks of activity were observed since 9-21-21 (Fig. 1).In order to identify the SARS-CoV-2 variant responsible for this activity, RNA extracted from stored wastewater concentrates was shipped to GT Molecular and their NGS and variant calling pipeline was used.RNA from the site of interest and neighboring sites were analyzed as a control.The site of interest contained high relative abundance of Delta variant lineage AY.25.1 at the first time point tested (i.e., 9-21-21) (Fig. 1; Table 1).This corresponded to the beginning of the Delta variant wave in Central Michigan [12].The site of interest began shedding the Alpha variant lineage during the next two time points tested (i.e.,   2).The site of interest had high relative abundance of Omicron variant lineages during the next two time points tested (i.e., 3-14-22 and 4-25-22) (Fig. 1; Table 1).This corresponded to the end of the first Omicron wave in Central Michigan [12].The Alpha variant lineage became the dominant isolate in all remaining wastewater samples from the site of interest in all 2022 and 2023 samples tested, with relative abundance ranging from 47.1 to 98.0%.The Alpha variant lineage was also detected in the closest neighboring WWTP on 4-10-23, which corresponded to a large peak in virus shedding at that site (Fig. 1; Table 1).Other sites contained Omicron variant lineages BG.5, XBB.1.5,XBB.1.5.23,XBB.1.28,XBB.1.5.1, XBB.1.5.17,XBB.1.5.49, and Delta variant lineage DT.2 at varying relative abundance (Table 1).

Accumulation of novel mutations in the RBD and NTD
We reasoned that chronic shedding of SARS-CoV-2 would lead to accumulation of novel or cryptic mutations that do not align with sequences identified in most clinical and wastewater samples.Alignment of reconstructed consensus genes with the SARS-CoV-2 Spike reference gene and a consensus Alpha variant lineage clinical sequence revealed that the Spike proteins harbored 9 novel mutations in 2021, 25 novel mutations in 2022, and 38 novel mutations in 2023 (Supplemental Fig. 1).
The closest neighboring WWTP also contained the Alpha variant lineage (Table 1).Alignment of the reconstructed Spike gene from CE 4-10-23 revealed shared mutations with reconstructed Spike genes from 11-9-21, 9-12-22, and 5-1-23, and four unique mutations: L24S, H245Y, V445A, and Y1155F (Supplementary Fig. 2).The mutations shared with the reconstructed Spike proteins from 11-9-21, 9-22-22, and 5-1-23 suggested that CE 4-10-23 shared a common ancestor with more recent isolates.Most of the mutations reside in the Spike RBD and NTD.These domains are critical for host receptor binding and contain key epitopes leveraged by the adaptive immune system to control and prevent repeat infection.A striking mutation that developed in 2023 was R403K.This converted the RGD receptor binding motif to KGD, which is present in SARS-CoV-1 -a historically more lethal yet less transmissible virus [20][21][22].This is particularly interesting since R403 is highly conserved in SARS-CoV-2 Spike and only 294 of ~ 3.4 million sequences recorded on GSAID contained a conservative change of R403K [23].Many other mutations have also been previously characterized.For instance, engineering the A372T mutation into SARS-CoV-2 reduced binding to ACE2 and enhanced replication in human lung cells [24].K444R, V445A, G446D, Y449N, L452Q, N460K, S477N, and E484V (and cryptic mutation E484A) have been associated with resistance to antibody-mediated neutralization, and N460K was previously observed during a persistent infection in an immunocompromised patient [14,[25][26][27][28][29][30][31][32][33].E848V also reduced ACE2 binding [33].L452Q had higher binding to soluble ACE2 [31].Y453F increased binding to mink ACE2 [34].Q493K increased binding to mouse ACE2 and developed in an immunocompromised patient undergoing convalescent plasma treatment [35][36][37].Q498L was predicted to lower stability of Spike and ACE2 interaction but no studies are available to confirm this prediction [38].G504D is associated with immune evasion; however, the G504D substitution is rarely observed in SARS-CoV-2 strains, with a mutant rate below 0.002% [29,39].Y505H was in all lineages of the Omicron variant suggesting that it enhanced immune evasion and receptor binding [40].
In the NTD, H49Y impacts Spike structure and influences binding of several antiviral compounds and increased resistance to vaccine sera [25,41].T76I increases infectivity in the Lambda variant, although it is suggested that it behaves as a compensatory mutation [31].T76I also effects antibody binding and immune escape [42].V143Del is present in Omicron variants suggesting that it enhanced immune evasion and receptor binding [43].Y144Del may play a role in ACE2 receptor binding or neutralizing antibody escape and deletions in this region were identified in immunocompromised patients [37,44,45].T19K, W64R, H66Q, I68Del, K147T, S151I, G181E, N196S, Y248S, and G257D substitutions have not been characterized.Toward the C-terminus, L828F is a highly prevalent cryptic mutation, which has an unknown origin, although likely due to shedding from chronically infected humans or animals [13,14].D571G, I587V, V772A, T941S, V1176F, K1191N, and Q1201K substitutions have not been characterized.
Collectively, the mutations that have accumulated in the Spike gene are likely a response to the host's innate and adaptive immune systems, and perhaps due to longterm persistence in an immunosuppressed patient and adaptation to any prophylactic or targeted drugs used to clear the infection [37,[45][46][47].The presence of previously identified cryptic mutations suggest that these mutations may predict a chronic infection.The goal of this work is not to identify the person(s) responsible for chronically shedding this virus.However, we would like to highlight the potential of wastewater surveillance and possibly fecal testing for identification of long COVID.This would be particularly useful if convergent mutations emerge during a chronic infection that are predictive of this condition.The spectrum of mutations might guide appropriate selection of antivirals and antibody-based therapies.Additional mutations outside of the Spike gene were also present in these samples but not analyzed for this manuscript.It is likely that mutations outside of the Spike gene are also important to facilitate chronic infection.
In summary, these data support that an individual can be chronically infected with SARS-CoV-2 over many months and possibly a few years.During this time, SARS-CoV-2 can accumulate many mutations in the Spike gene, which concentrate in the RBD and NTD.Further research is needed to determine if these mutations are predictive of chronic infection and if they can be used as a biomarker in individuals with Long COVID and leveraged to tailor selection or development of pharmaceutical therapies.Additionally, this study shows that small WWTPs can enhance the resolution of rare biological events and allow for total reconstruction of viral genes and their corresponding proteins.

Fig. 1
Fig. 1 SARS-CoV-2 genome copies (GC)/100 mL wastewater detected at each weekly sample site from July 2021 to June 2023.Two letter site codes and dates are shown that correspond to sequenced samples and the variant that was identified in the highest relative abundance is indicated in parentheses.The colors and shapes associated with each sample are located in the graphical legend

Fig. 2
Fig.2Heatmap showing the percent prevalence of novel and previously identified cryptic mutations (*) in each wastewater sample that was positive for the Alpha variant lineage[13,14].Empty cells represent mutations that had fewer than 3 reads

Table 1
GT molecular variant calling a Relative abundance of variants of concern (VOC) as a percentage b Relative abundance of VOC lineages as a percentage